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ABSTRACT 

We show that neural network classifiers can be helpful to discriminate Higgs 
production from background at LHC in the Higgs mass range Mu ~ 200GeV. 
We employ a common feed-forward neural network trained by the backpropa- 
gation algorithm for off-line analysis and the neural chip Totem, trained by 
the Reactive Tabu Search algorithm, which could be used for on-line analysis. 



1 Introduction 



The main purpose of the future Large Hadron Collider (LHC) at CERN is the search for 
the Higgs boson H, the only particle predicted by the Standard Model of the electroweak 
interactions that has not been discovered yet. The discovery of the Higgs particle would 
be of paramount importance in confirming the peculiar feature of the electroweak vacuum 
embodied in the spontaneous breaking of the SU(2)l x £7(1) electroweak symmetry; on 
the other hand, its absence from the physical spectrum would certainly pave the way 
for exciting new physics, be it in the form of supersymmetry, or theories with a strongly 
interacting Higgs sector fT] or something else. 

The big theoretical and experimental effort that will be provided in the next few years 
is strongly motivated by the relevance of the stake, but also because, as the Standard 
Model predicts and detailed studies have confirmed ||, the signal, i.e. events characterized 
by the production of the Higgs boson, will be overwhelmed by background events, with 
multi-hadron production induced by strong interactions of quark and gluons. For this 
reason, a crucial step in the implementation of the LHC programme will be provided by 
data analysis, which will be asked to disentangle Higgs events from huge background. It 
is not our aim to review here the actual experimental set-up of the LHC experiments, 
nor to examine the performances of LHC detectors: the purpose of this letter is simply 
to suggest that part of the task to extract the signal from the noise could be supplied by 
Artificial Neural Networks (ANN)Q 

The role of ANN in high energy physics experiments has been stressed in a number 
of papers and we refer the interested reader to the existing literature ||, [7| || || [10] ; 



in general, when compared with traditional methods of statistical discrimination, they 
offer the advantage of possible on-line implementation and often better results in terms of 
purity and efficiency |Tl| . These latter features have been observed already in preliminary 



studies on the use of ANN for Higgs search at LHC 0. The present analysis differs from 
these studies for two reasons: 

i) we adopt a more appropriate choice of the input variables; 

ii) we make a comparison between two implementations of ANN in the feed-forward 
architecture; namely, a simulated neural network trained by the usual backpropagation 
algorithm |12| is compared to a hardware realization of a low-precision-weight Multi- 



Layer Perceptron (MLP), the neurochip Totem whose training- by-example task is 
accomplished by a derivative free combinatorial optimization algorithm called Reactive 
Tabu Search (RTS) 0, pj. 



A detailed presentation of the two NN will be given in Sections 2 and 3. Here we 
conclude this introduction by stressing the limits and some of the general features of our 
analysis. 

First of all we do not investigate the whole expected Higgs mass (Mh) range. From 
LEP data a lower bound for the Hide's mass is known: M H > 60 GeV @ . Theoretical 
arguments based on unitarity or on the applicability of the perturbation theory indicate 
that Mh should not be larger than ~ 800 GeV UTTf ; as for the analysis based on studies of 
radiative corrections, they appear still inconclusive, due to the weak dependence of these 
effects on Mh- Since we mainly wish to present some case studies, rather than to make 



1 for a general introduction to ANN see J}). 
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an extensive review, we limit our analysis to the mass values of Mh = 150 and 200 GeV. 
In a range around these Higgs mass values the preferred decay channel, as indicated by 
previous studies 0, 0, is the following one: 

pp^HX^A^iX. (1.1) 



For rather larger Higgs masses (> 400GeV) the events (|1 . 1| ) would be clearly distinguish- 
able from the peak in the four-muon invariant mass. In our case, however, the signal 
should be overwhelmed by two main sources of background, namely the tt production: 

pp-^ti X -»• /i+ f.r fi + jT X' , (1.2) 

with the 4 muons arising from semileptonic decays of the top and antitop, and the Zbb 
production: 

pp^ Z bb X -> fi + fi + fi- X' , (1.3) 

with a muon pair arising from Z° decay and the other one from semileptonic b and 
b decays. It should be observed that, due to the actual value of the top quark mass 
(M t = 175 ± 12 GeV |T?|), the processes ( |1.2| ) and ( |1.3| ) have comparable cross sections; 



for their calculation we rely in this paper on the Pythia Montecarlo code [19|. At the 
LHC energy (y/s = 14 TeV) one hasQ: 

a{pp -> ttX -> \i+ yT \T X 1 ) = 7.8 x 10~ 9 mb , (1.4) 
a{pp -> Z°bbX -> fi+fi-fi+n-X') = 6.0 x 10~ 9 m& . (1.5) 

These figures should be compared to the computed cross section for Higgs production and 
subsequent decay into four muons: 

a(pp -»• H X -> ZZ* X -> fj, + n-fi + n'X) = 1.2 x 10" 12 m6 (1.6) 

for Mjj = 150 GeV; 

cr(j9j9 -> H X -> V -> n + n~ H + V~ X) = 2.8 x 10" 12 m6 (1.7) 

for = 200 GeV. The main difference between the two cases is that for Mh = 200 GeV 
the two Zs are real, while for Mg = 150 GeV only one is real, the other being virtual. 
As a consequence, in the latter case the constraint = Mz does not hold for one of 
the pairs. 

For the use of ANNs in high energy physics a crucial point is the choice of sensible 
physical observables. On the base of previous studies 0, 0] the four final muons are ordered 
according to their energy, and the following 10 variables X%, ...,Xx are introduced: 



X\ — X^ : the transverse momentum of the four muons. The distributions of these vari- 
ables for background events, as simulated by the Pythia Montecarlo, show a max- 
imum close to zero for those muons coming from quark fragmentation, while the 
signal distributions show a peak around 25 - 50 GeV; a similar distribution is found 
for the two muons deriving from Z decay in the Zbb background events; 



2 We notice that in some previous studies, performed before the discovery of the top quark, smaller 
values of Mt, e.g. 130 GeV, were in general adopted, which resulted in a higher cross section for the 



process (1.2) and a negligible one, in comparison, for the process (|l.3|). 
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Xz, — Xg : the invariant masses of the four different n + n~ pairs. For Mz < Mh < 2Mz, 
two of the distributions for the signal events show a peak around the Z° mass, that 
is absent for the background. The peaks arise from muons coming from the real Z° 
decay; they are two since the ordering based on the energy mixes in part the events 
from the two Z°. For M H > 2M Z , of course, all the four distributions exhibit the 
Z° mass peak; 

Xg : the four muons invariant mass; 

X w : the hadron multiplicity related to hard jets. 



A comment on the variable X±o is in order. We expect that hadrons generated by hard 



parton scattering are more copiously produced by the process (|1.3|) and especially (|1.2|) 
as compared to (|1 . 1| ) . However such a peculiar feature of the events (|1 . 1| ) is hidden in the 
huge number (typically several hundreds at the LHC energy) of hadrons produced by the 
hadronization of the two beam jets. The remnants of the two beams disintegration could 
be eliminated in the LHC experimental conditions by appropriate cuts in the angular 
variables, but in our simulations we choose to pre-process the data by the so called k± 
clustering algorithm 21]. This algorithm consists in general of two steps. In the first 
step one compares 



dij = 2 mm{E^, E^fa - Vj y + (fa - faf (1.8) 

and 

d iB = E 2 Ti , (1.9) 

where En is the transverse energy of the z-th particle with respect to the beam direction, 
r\i is its pseudorapidity and fa is the azimuth angle with respect to the beam axis: a 
final state particle i is attributed to the beam remnants (beam jet) if diB is smaller than 
dij, otherwise it is attributes to a hard jet. In the second step, which is not of interest 
here, the particles belonging to hard jets are divided into different clusters^. After the 
application of the k± algorithm and the removal of the hadrons belonging to the beam 
jets, the remaining hadron multiplicity is called by us X\$. The relevance of the variable 
Xlo can be seen from Fig. 1, where we compare its distribution relative to the processes 
Q), Q) and flTJ)- 

Having defined the input variables, we now discuss the analyses performed on the 
Montecarlo data using the two neural networks. 



2 Analysis by simulated ANN trained through the 
backpropagation algorithm 

First we discuss the results obtained using a simulated net in the most common architec- 
ture adopted in high energy applications, i.e. the feed-forward MLP, trained though the 
"classical" backpropagation algorithm. The net is composed of one input layer with 10 
neurons Xj, one hidden layer with 21 neurons Zj and one output unit y. 

3 For another application of k± algorithm in the context of ANN studies of high energy physics exper- 
iments see p2[ . 
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The physical observables introduced above, once normalized to the interval [—1, 1], 
become the inputs Xj (j = 1, ...10) of the NN classifier. Each pattern-event p consists of 
the array Xj of the input variables (features) and the value y of the output neuron (y = 1 
for the signal, i.e. the Higgs production, and y = for the background). The patterns 
have been divided into two sets, the training set, used by the network to learn, and the 
testing set, used to evaluate subsequently its performance. 

As already mentioned, our simulations have been obtained by the Pythia Montecarlo 
Code|19[]. We have treated the case of two possible values for the mass of the Higgs 
particle: one below 2M Z i.e. 150 GeV, and one just above, i.e. 200 GeV. For each of 
these mass values the training data set consisted of N = 2000 signal events, 2000 ft and 
2000 Zbb events, while the testing data set consisted of 2000 pp — > HX — > 4/j,X signal 
events, 5.6 x 10 6 ti and 4.2 x 10 6 Zbb background events for the case M H = 200 GeV. 
The data in the training sets were all different from those in the testing sets. 

As usual, the performance of the network has been evaluated by introducing two 
variables: the purity (P) and the efficiency (n) defined as follows: 

N a 

P = !1M (2 1) 

N H + N B y ' ' 

and 

N a 

V = ^ (2.2) 

where is the total number of Higgs events in the testing sample, N H is the total number 
of the accepted (i.e. correctly identified) Higgs events and N B is the total number of the 
accepted background events, i.e. events that are incorrectly identified as Higgs events. 

One can increase the purity decreasing the efficiency by introducing a threshold pa- 
rameter / e [0, 1] as follows. The range of values of the output neuron y&> in the testing 
phase is divided into the subintervals: I\ = [0, 1 — 1} and J 2 =]1 — /, 1], so that if yW) e I\ 
(respectively e I 2 ) the event is classified as background (respectively: signal). 

Our results are reported in Fig. 2 (dots); it shows that in the case of Mh = 200 
GeV one can reach appreciable values of purity. The situation is less favorable in the case 
of Mh = 150 GeV, when, due to the virtuality of one of the two Z, the reduction of 
efficiency is relevant. 



3 Analysis using the RTS training algorithm as im- 
plemented on the neurochip Totem 

One of the purposes of the present work is to contribute to clarify the possibility of using 
neural network classifiers in time-critical operations, like the fast triggering required in 
some high energy experiments, without loosing high quality performances. The neurochip 
Totem, has been conceived to implement Multi-Layer Perceptrons in the feed-forward ar- 
chitecture on the basis of a simple and fast computational structure |TB| . This is achieved 



escaping the necessity of derivative calculations, turning the task of training-by-examples 
into a combinatorial optimization problem, whose solution is searched then by means of 
the Reactive Tabu Search method [0, T5j. Differently from the derivative-based back- 



propagation algorithms, RTS thus allows simple and low precision computation, using 



4 



only up to 8 bits for the synaptic weights and 16 bits to represent the feature parame- 
tersQ: this is indeed the basis of the simple and fast computational structure said above. 
Totem can be set to different feed- forward MLP architectures and for the present work 
it has been given exactly the same 10-21-1 architecture as the network described in the 
preceding section. We have used the same simulation Pythia Montecarlo data as before, 
as well as the same overall procedure (not the algorithm, evidently) for training and test- 
ing. A remarkable difference is that now we represent the physical observables by five 
decimal digit integers, against the double precision floating point representation needed 
for the backpropagation NN. The truth value for a Higgs event was fixed to 8192 and to 
for a background events, while the threshold parameter controlling the purity level was 
varied by steps of unity between the two truth values. The results in terms of purity and 
efficiency are collected in Fig. 2 for a case of 8-bit synaptic weights and 200 GeV of Higgs 
mass (the data are represented by stars). 



4 Conclusions 

Neural networks have a clear advantage over traditional statistical methods, since they 
can support a high degree of parallelism and could be used for on-line analysis of the ex- 
perimental data. Therefore their use in the future LHC experiments should be seriously 
considered and thoroughly investigated. We have contributed to this analysis by consider- 
ing two different nets. The first one is a simulated ANN trained by the backpropagation 
algorithm. The second one is a hardware implementation of a fast NN, the neurochip 
Totem. 

Our results show that NN can be helpful in the discrimination of background events 
from the signal in the Higgs search at the future Large Hadron Collider to be built at 
CERN. We have proved this by considering one particular Higgs decay channel (H — > 4/i) 
in the mass range M# e (150 — 200) GeV and including the most relevant backgrounds: 
it and Zbb. For both the neural nets, the case Mh — 200 GeV is more favourable, 
and acceptable values of purity and efficiency can be obtained; in particular the neural 
chip Totem produces in general better performances and, in view of its possible on-line 
implementation, should be seriously considered, in our opinion, as a tool for the analyses 
to be performed at the future Large Hadron Collider at Cern. 

Acknowledgments. We wish to thank G. Marchesini for most useful comments and 
P. De Felice and G. Pasquariello for their collaboration at an early stage of this work. 



For a comparison in performance between Totem and backpropagation based neurochips see for 
instance |23) 



5 



References 



[1] R. Casalbuoni, S. De Curtis, D. Dominici and R. Gatto, Nucl. Phys. B 282 (1987) 
235; R. Casalbuoni, P. Chiappetta, D. Dominici, F. Feruglio and R. Gatto, Phys. 
Lett. B 269 (1991) 361. 

[2] D. Froidevaux, in Proc. of Large Hadron Collider Workshop, Eds. G. Jarlskog and 
D. Rein, CERN 90-10 and ECFA 90-133, Vol. II, pag. 444; A. Nisati, ibid. pag. 492; 
M. Delia Negra et al., ibid. pag. 509. 

[3] J. Hertz, A. Krogh and R. G. Palmer, Introduction to the Theory of Neural Compu- 
tation (Addison- Wesley) (1991). 

[4] L. Lonnblad, C. Petersen and T. Rognvaldsson, Phys. Rev. Lett. 65 (1990) 1321. 

[5] L. L6nnblad, C. Petersen and T. Rdgnvaldsson, Nucl. Phys. B 349 (1991) 675; C. 
Bortolotto, A. De Angelis and L. Lanceri, Nucl Inst, and Methods A 306 (1991) 457; 
L. Bellantoni et al, Nucl. Inst, and Methods A 310 (1991) 618. 

[6] G. Marchesini, G. Nardulli and G. Pasquariello, Nucl. Phys. B 394 (1993) 541. 

[7] P. Chiappetta, P. Colangelo, P. DeFelice, G. Nardulli and G. Pasquariello, Phys. Let. 
B 322 (1994) 219. 

[8] F. Anselmo et al, Nuovo Cim. 107 A (1994) 129. 

[9] C. Bortolotto, A. de Angelis, N. D. Groot and J. Seixas, Int. Journ. of Modern 
Physics C 3 (1992) 733. 

[10] P. Mazzanti and R. Odorico, Int. Jour, of Neural Systems 3 (1992) 243. 

[11] B. Denby, The use of Neural Networks in High Energy Physics, in Neural Computa- 
tion 4 (5) (1993) 505. 

[12] D. E. Rumelhart, G. E. Hinton and R. J. Williams, in Parallel Distributed Processing: 
Explorations in the Microstructure of Cognition, MIT Press, Cambridge MA (1986). 

[13] G.Anzellotti et al, Journ. of Mod. Phys. C, 6 (1995) 555 

[14] R. Battiti and G. Tecchiolli, The Reactive Tabu Search, ORSA Journal on Comput- 
ing, 6 (2) (1994) 126. 

[15] R. Battiti and G. Tecchiolli, "Training Neural Nets with the Reactive Tabu Search", 
IEEE Transactions on Neural Networks, 6 (1995) 1185. 

[16] ALEPH Collab., Phys. Lett. B 313 (1993) 299. 

[17] M. Luscher and P. Weisz Nucl. Phys. B 290 (1987) 5; B 295 (1988) 65 and B 318 
(1989) 705. 

[18] M. Mangano and T. Trippe of the Particle Data Group, in Review of Particle Prop- 
erties, Phys. Rev. D 54 (1996) 309. 

[19] H. U. Bengtsson and T. Sjostrand, Computer Physics Commun. 46 (1987) 43; T. 
SjSstrand, CERN-TH.6488/92. 



6 



[20] S. Catani, Yu. L. Dokshitzer, M. Olsson, G. Turnock and B. R. Webber, Phys. Let. 
B 269 (1991) 432; S. Bethke, Z. Kunszt, D. E. Soper and W. J. Stirling, Nucl. Phys. 
B 370 (1992) 310; N. Brown and W. J. Stirling, Zeit. Phys. C 53 (1992) 629. Yu. 
L. Dokshitzer and M. Olsson, Nuc. Phys. B 396 (1993) 137. 

[21] S. Catani, in Proc. of Int. Europhysics Conf. on High Energy Physics, Marseille 1993, 
eds. J. Carr and M. Perrottet (Ed. Frontieres, France) (1994) 771. 

[22] P. De Felice, G. Nardulli and G. Pasquariello, Phys. Lett. B 354 (1995) 473. 

[23] C.S. Lindsey and Th. Linblad, "Experience with RTS as implemented in the TOTEM 
chip", Proc. of ICNN96, Washington D.C., June 1996. 



7 



FIGURES. 



5000 



i i i — i i i i 



200 - 




h i i i i i i i — r-r=i 



1000 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.? 1 

Efficiency 



Figure 2: The purity P versus the Higgs efficiency 77 for two different NN in the case 
Mu = 200 GeV. 
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